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Molecular Analysis 

The present invention relates to a method of determining the sequence or occurrence 
frequency of a number of variable gene inserts from a gene library, wherein each 
5 variable gene insert is flanked 5 ! and 3 f by known sequences, the method comprising; 
conducting polymerase chain reaction to amplify the variable gene inserts to produce 
components of a mixed PGR product, ligating the components of the mixed PGR 
product to produce a concatenated sequence and sequencing or determining the 
occurrence of the gene inserts in the concatenated sequence. 

10 

Background to the invention 

Peptide phage display is a prototypical version of directed in vitro molecular 
evolution of a large combinatorial library by sequential rounds of physical selection 

15 and enrichment. Peptide phage display selection methods have established 

themselves as powerful tools for the identification of short linear peptide mimetics of 
many ligand classes. Variant techniques have also been developed extending this 
methodology to selection from large libraries of oligonucleotides (either random or 
constrained), translation-arrested ribosomes, phage-displayed binding proteins (of 

20 which single chain fragments of immunoglobulin is the largest group) and so on. 

A major limitation of all these methods is that the complexity and composition of the 
selection-evolved sublibrary is assessed by analysis of a very small sample drawn at 
random from this sublibrary. In consequence, enrichment is deemed to have been 
25 achieved when a very few, or more usually one, sequence dominates the sample. This 
may be acceptable when the evolution is directed by a simple target with one or very 
few binding sites, but severely limits the method if the target is complex. 

/ - ./'v.y 

The outcome of repeated rounds of selection from a large random peptide phage 

30 display library (typically beginning with >10 9 different -pha^^sf^jteduced 

complexity sub-library enriched for sequences showing specific affinity for the 

selection matrix. Typically such a sub-library may contain 10 3 -10 4 different phage, 
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but in all published studies the outcome of phage panning is assessed by sequencing 
<20 independent clones. 

The problems inherent in directed evolution with a complex target have already been 
5 recognised in the field; a theoretical analysis has been presented (Vant Hull et al 

(1998) J Mol Biol 278 597-597) and a partial practical solution suggested (Messmer 
et al (2000) J Mol Biol 296 821-832). This "iterative panning" solution relies on 
completing multiple rounds of directed evolution until a single 'winning' sequence 
emerges. This sequence is then prepared as a synthetic peptide and used in the 

10 blocking solution during a repeat of the whole experiment (ie a further set of 

selections on new target material). Binding of the first class of peptides is now 
blocked and a second 'winning' sequence is selected. This process can be continued 
indefinitely, each entire round generating one new binding sequence. The method is 
very slow, very expensive and probably impossible for many key complex biological 

15 target materials, since it relies on a large supply of functionally homogenous target 
material (tumour tissue, infected cells etc) for many rounds of selection. 

The present invention addresses the problems identified in the prior art. 

20 The present invention provides a method of determining the sequence and/or 
occurrence frequency of a number of variable gene inserts from a gene library, 
wherein each variable gene insert is flanked 5' and 3' by known sequences, the method 
comprising; 

25 conducting polymerase chain reaction to amplify the variable gene inserts to produce 
components of a mixed PCR product; 

ligating the components of the mixed PCR product to produce a concatenated 
sequence; and 

30 

sequencing or determining the occurrence of the gene inserts in the concatenated 
sequence. 



In accordance with the invention, the method can be used for determining the 
sequence of a number of variable gene inserts or for determining the occurrence 
frequency of a number of gene inserts from a gene library. Preferably the gene library 
is a peptide phage display library. 

In the present invention, the polymerase chain reaction is conducted using primers 
complementary to the known sequences 5 1 and 3' to the variable inserts. Further, 
between the step of ligating the components of the mixed PCR product to produce a 
concatenated sequence and sequencing or determining the occurrence frequency of 
the gene inserts, it is preferable to subclone the size-selected concatenated products 
into a convenient vector for production of plasmid DNA suitable for automated 
sequencing. 

The present invention relates to a library (of any size, including a large library) which 
may have been selected or evolved by cycles of physical selection and amplification 
to generate a sub-library whose gene inserts encode sequences that share some desired 
binding property. 

The invention provides a simple and economical way of sequencing the relevant 
variable parts of the gene encoding the phage code protein from a large number 
(preferably all) of the phage in the selected sub-library. This is in contrast to the prior 
art which only determined the sequence of a tiny and potentially unrepresentative 
sub-set of the library. Furthermore, the present invention is a large-scale unbiased 
analysis without plaque selection or phage DNA purification from selected plaques. 
The method is achieved by using a polymerase chain reaction with unique primers 
lying just 5' and just 3' to the variable insert in the gene encoding phage coat protein. 
The reaction is carried out on pooled phage DNA isolated from an aliquot of the 
library without plaque purification and therefore contains proportional representation 
amplification of all variable regions in the library or sub-library. 



The benefit of the present invention is that each variable region will have an 
abundance in the double strand DNA product that is proportional to the abundance of 
that insert sequence amongst the phage in the selected sub-library. When the 
components of the mixed PGR product have been prepared, they are ligated to 
5 produce a concatenated sequence. It may be useful or preferable to digest the 

components of the mixed PCR product with a very infrequently cutting restriction 
endonuclease before ligation to produce concatenated sequences. Following 
production of the concatenated sequences, they may preferably be size selected to 
around 1.5kb before being cloned into a convenient plasmid. Subsequent sequencing 
10 of such inserts generates greater than 30 variable insert sequences per sequencing 
lane. 

Known software that is used to automatically strip the joining sequences out of 
continuous DNA sequence to identify and then tabulate the di-tags during serial 
15 analysis of ligand-selected peptide display sub-libraries can be used to generate 
abundance histograms for all the insert sequences identified. 

The present invention allows the easy analysis of many (often all) of the variable 
inserts present in the gene library population. This permits the evolution of the 
20 selection process to be followed much more accurately. It also ensures that consensus 
matrix-binding sequences are identified both earlier and more accurately. Also 
important is that the method of the present invention overcomes the problems of 
clonal dominance due to the emergence of a single family of binding sequences which 
prevents analysis of interactions on complex matrices. 

25 

The present invention allows the rapid and complete identification of all linear or 
cysteine cyclised peptides that exhibit a specific behaviour permitting gene selection 
(either positive or negative). It is also applicable to the classification of all antibody 
epitopes in a complex humoral response to a pathogen. 



30 
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The source of the variable gene inserts in the library is nucleic acid. It may be one or 
more selected from the source of a bacterium, virus, peptide-mimetic, 
immunoglobulin or cells, including infected cells and tumour cells. 

5 The specific behaviour permitting gene selection (also described herein as a specific 
characteristic permitting gene selection) may be any, including the fact that the gene 
encodes a particular protein which binds to another protein in question. Alternatively, 
the gene may encode a protein sequence which only occurs in one state of tissue in 
comparison with the same tissue in a different state. For example, normal versus 
10 tumour tissue, infected versus non-infected tissue, wild type versus mutant, healthy 
versus oxidatively damaged, healthy versus ischaemic, or occurrence during a 
particular time zone which is absent at an alternative time zone. 

The essence of the proposed invention is a method based on concatenation of short 
15 PCR products for efficient sequencing; this permits the analysis of hundreds if not 
thousands of sequences corresponding to peptides selected at each round of a target- 
directed evolution from a large combinatorial library. In its simplest form this method 
reveals multiple sequence families as they are enriched by selection; a single series of 
enrichment experiments generates frequency histograms for all emergent classes of 
20 selected sequence. In this way, multiple binding sites on a complex target are 
identified without using the time-consuming and expensive iterative panning 
approach. 

The present invention can be utilised in various ways. For example, differential 
25 panning on two states (normal versus tumour tissue; infected versus non-infected; 
wild-type versus mutant; healthy versus oxidatively damaged; healthy versus 
ischaemic, etc) together with frequency histogram generation on large insert numbers 
at each round of panning offers a new type of information. The frequency histograms 
of the two independent panning experiments are compared (in a manner analogous to 
30 comparing two SAGE tag profiles, or the microarray binding data from the mRNA 

samples). This identifies peptide binders that are state independent (ie lying along the 
diagonal on the two state plot) as well as binders that are enriched in one state or the 
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other. This already enriches the data obtained very substantially. However, by 
adding a third dimension that identifies the time of appearance of a given sequence 
during the multiple rounds of panning an entirely new type of information emerges. It 
is now possible to perform cluster analysis on points that lie off the diagonal within 
this 3-D space to identify groups of weak signals that together offer a discriminant 
measure between the two states. To take a specific example, this approach could be 
used to search for small groups of peptide mimetics that can together discriminate 
between normal and tumour tissue in a way that could not be achieved by analysis of 
binding of any single peptide. 



Claims 



1. A method of determining the sequence or occurrence frequency of a number of 
variable gene inserts from a gene library, wherein each variable gene insert is flanked 
5' and 3 1 adjacent known sequences, the method comprising; 

conducting polymerase chain reaction to amplify the variable gene inserts to produce 
components of a mixed PGR product; 

ligating the components of the mixed PCR product to produce a concatenated 
sequence; and 

sequencing or determining the occurrence of the gene inserts in the concatenated 
sequence. 



2. A method as claimed in claim 1 wherein the gene library is a peptide phage 
display library. 

3. A method as claimed in claim 1 or claim 2, wherein the components of the 
mixed PCR product are digested with a restriction endonuclease before ligation to 
produce the concatenated sequence. 

4. A method as claimed in any one of claims 1 to 3, wherein the concatenated 
sequence is cloned into a plasmid before sequencing. 

5. A method as claimed in claim 4, wherein the concatenated sequence is size 
selected to around 1.5 kilobases in length before cloning into the plasmid. 

6. A method as claimed in any one of claims 2 to 5, wherein the number of 
variable gene inserts are from phage which exhibit a specific characteristic. 

7. A method as claimed in claim 6, wherein the specific characteristic is one or 
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more of protein binding or occurrence only in one state of tissue in a comparison with 
the same tissue in a different state. 

8. A method as claimed in any one of claims 2 to 5, wherein the number of 
variable gene inserts are from phage which do not exhibit a specific characteristic. 

9. A method as claimed in any one of claims 1 to 9, wherein the source of the 
variable gene inserts is nucleic and one or more of a bacterium, virus, 
peptide-mimetic immunoglobulin or cells including infected cells and tumour cells. 

10. A method as claimed in any one of claims 1 to 9, wherein frequency analysis 
of sequential rounds of selection of variable gene inserts from a gene library is used to 
perform discriminant analysis of the states of the selection method. 

11. A determination of the sequence or occurrence frequency of a number of 
variable gene inserts from a gene library, obtained by a method according to any one 
of claims 1 to 10. 



